AITopics | low-rank bias

Collaborating Authors

low-rank bias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural collapse vs. low-rank bias: Is deep neural collapse really optimal?

Neural Information Processing SystemsMar-22-2026, 22:18:10 GMT

Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC), and a growing body of works is currently investigated the propagation of neural collapse to earlier layers of DNNs -- a phenomenon called deep neural collapse (DNC). However, existing theoretical results are restricted to either linear models, the last two layers or binary classification. In contrast, we focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift. As soon as we go beyond two layers or two classes, DNC stops being optimal for the deep unconstrained features model (DUFM) -- the standard theoretical framework for the analysis of collapse. The main culprit is the low-rank bias of multi-layer regularization schemes. This bias leads to optimal solutions of even lower rank than the neural collapse. We support our theoretical findings with experiments on both DUFM and real data, which show the emergence of the low-rank structure in the solution found by gradient descent.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Neural collapse vs. low-rank bias: Is deep neural collapse really optimal?

Neural Information Processing SystemsMay-27-2025, 21:43:24 GMT

low-rank bias, neural collapse, optimal

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features

Garrod, Connall, Keating, Jonathan P.

arXiv.org Artificial IntelligenceOct-30-2024

Modern deep neural networks have been observed to exhibit a simple structure in their final layer features and weights, commonly referred to as neural collapse. This phenomenon has also been noted in layers beyond the final one, an extension known as deep neural collapse. Recent findings indicate that such a structure is generally not optimal in the deep unconstrained feature model, an approximation of an expressive network. This is attributed to a low-rank bias induced by regularization, which favors solutions with lower-rank than those typically associated with deep neural collapse. In this work, we extend these observations to the cross-entropy loss and analyze how the low-rank bias influences various solutions. Additionally, we explore how this bias induces specific structures in the singular values of the weights at global optima. Furthermore, we examine the loss surface of these models and provide evidence that the frequent observation of deep neural collapse in practice, despite its suboptimality, may result from its higher degeneracy on the loss surface.

analytic perspective, neural collapse, unconstrained feature, (2 more...)

arXiv.org Artificial Intelligence

2410.23169

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks

Chen, Ke, Yi, Chugang, Yang, Haizhao

arXiv.org Machine LearningOct-2-2024

We study the implicit bias towards low-rank weight matrices when training neural networks (NN) with Weight Decay (WD). We prove that when a ReLU NN is sufficiently trained with Stochastic Gradient Descent (SGD) and WD, its weight matrix is approximately a rank-two matrix. Empirically, we demonstrate that WD is a necessary condition for inducing this low-rank bias across both regression and classification tasks. Our work differs from previous studies as our theoretical analysis does not rely on common assumptions regarding the training data distribution, optimality of weight matrices, or specific training procedures. Furthermore, by leveraging the low-rank bias, we derive improved generalization error bounds and provide numerical evidence showing that better generalization can be achieved. Thus, our work offers both theoretical and empirical insights into the strong generalization performance of SGD when combined with WD.

generalization error, low-rank bias, neural network, (13 more...)

arXiv.org Machine Learning

2410.02176

Country:

North America > United States > California (0.05)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Delaware > New Castle County > Newark (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps from high to low rank

Wang, Zihan, Jacot, Arthur

arXiv.org Machine LearningSep-29-2023

The $L_{2}$-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as matrix completion, the goal is to converge to the local minimum with the smallest rank that still fits the training data. While rank-underestimating minima can be avoided since they do not fit the data, GD might get stuck at rank-overestimating minima. We show that with SGD, there is always a probability to jump from a higher rank minimum to a lower rank one, but the probability of jumping back is zero. More precisely, we define a sequence of sets $B_{1}\subset B_{2}\subset\cdots\subset B_{R}$ so that $B_{r}$ contains all minima of rank $r$ or less (and not more) that are absorbing for small enough ridge parameters $\lambda$ and learning rates $\eta$: SGD has prob. 0 of leaving $B_{r}$, and from any starting point there is a non-zero prob. for SGD to go in $B_{r}$.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

2305.16038

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback